Skip to content

feat(test_case): make trace_dict public for post-hoc agentic evaluation#2600

Open
tiffanychum wants to merge 1 commit intoconfident-ai:mainfrom
tiffanychum:feat/public-trace-dict-for-posthoc-eval
Open

feat(test_case): make trace_dict public for post-hoc agentic evaluation#2600
tiffanychum wants to merge 1 commit intoconfident-ai:mainfrom
tiffanychum:feat/public-trace-dict-for-posthoc-eval

Conversation

@tiffanychum
Copy link
Copy Markdown
Contributor

Summary

  • LLMTestCase._trace_dict → public trace_dict field — the private attribute made it impossible to pass a pre-recorded trace at construction time, so the four agentic trace metrics (TaskCompletionMetric, StepEfficiencyMetric, PlanQualityMetric, PlanAdherenceMetric) only worked with @observe at runtime.
  • Alias supportserialization_alias="traceDict" + validation_alias=AliasChoices("traceDict", "trace_dict") so both snake_case and camelCase work in model_validate and JSON round-trips (consistent with all other fields on LLMTestCase).
  • All internal usages updated — 6 assignment sites in evaluate/execute.py and all 4 metric files updated from ._trace_dict to .trace_dict; the runtime @observe path is unchanged.

Motivation

The non-trace evaluation path in task_completion.py is already marked:

# TODO: Deprecate this soon

But until now there was no way to reach the trace path without @observe at runtime. This PR closes that gap, enabling:

  • Offline / batch evaluation from saved logs or databases
  • CI trace replay without re-running the agent
  • Third-party pipelines where you can't decorate the agent code
  • Post-mortem analysis of production traces

Changes

File What changed
deepeval/test_case/llm_test_case.py _trace_dict: PrivateAttrtrace_dict: Field(...) with aliases
deepeval/evaluate/execute.py 6 internal assignments updated to public field
deepeval/metrics/task_completion/task_completion.py _trace_dicttrace_dict (4 references)
deepeval/metrics/step_efficiency/step_efficiency.py _trace_dicttrace_dict (6 references)
deepeval/metrics/plan_quality/plan_quality.py _trace_dicttrace_dict (4 references)
deepeval/metrics/plan_adherence/plan_adherence.py _trace_dicttrace_dict (8 references)
tests/test_core/test_test_case/test_single_turn.py Updated assertions + new test_trace_dict_constructor_and_alias test
examples/tracing/test_posthoc_evaluation.py New example showing post-hoc evaluation from a saved trace

Test plan

  • python -m pytest tests/test_core/test_test_case/test_single_turn.py -v — all existing tests pass, new alias test passes
  • python examples/tracing/test_posthoc_evaluation.py — post-hoc evaluation runs end-to-end with a pre-recorded trace dict
  • Existing @observe runtime path: no behaviour change (assignments in execute.py write to the same field)

Related

Parallel to issue #2579 (Turn._mcp_interaction PrivateAttr bug) — same root cause: Pydantic v2 PrivateAttr fields can't be set via the constructor.

…evaluation

`LLMTestCase._trace_dict` was a `PrivateAttr`, making it impossible to
pass a pre-recorded trace at construction time. This meant the four
agentic trace metrics (TaskCompletion, StepEfficiency, PlanQuality,
PlanAdherence) could only be used with `@observe` at runtime — not from
saved logs, CI replay, or third-party pipelines.

Changes:
- `LLMTestCase._trace_dict` → public `trace_dict` field (`Field(...)`)
  with `serialization_alias="traceDict"` and
  `validation_alias=AliasChoices("traceDict", "trace_dict")` so both
  snake_case and camelCase work in `model_validate` / JSON round-trips.
- All internal assignments in `evaluate/execute.py` (6 sites) updated
  from `._trace_dict =` to `.trace_dict =` — runtime @observe path is
  unchanged.
- All four agentic metrics updated from `test_case._trace_dict` to
  `test_case.trace_dict`.
- Unit tests in `test_single_turn.py` updated and a new
  `test_trace_dict_constructor_and_alias` test added.
- New example added at `examples/tracing/test_posthoc_evaluation.py`
  showing post-hoc evaluation from a pre-recorded trace dict.

Fixes the post-hoc evaluation gap noted in the TODO comments in
task_completion.py: the non-trace path is already marked
`# TODO: Deprecate this soon`; this PR makes the trace path accessible
without runtime instrumentation.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 4, 2026

@tiffanychum is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant